Skip to content

Fix #933: raise compaction memory_limit to 4 GB#952

Merged
erikdarlingdata merged 1 commit into
devfrom
feature/933-bump-compaction-memory-limit
May 12, 2026
Merged

Fix #933: raise compaction memory_limit to 4 GB#952
erikdarlingdata merged 1 commit into
devfrom
feature/933-bump-compaction-memory-limit

Conversation

@erikdarlingdata
Copy link
Copy Markdown
Owner

Summary

  • Raises memory_limit for parquet compaction from 1GB4GB in Lite/Services/ArchiveService.cs (both single-pass and pair-merge code paths)
  • Updates the now-stale comment about spilling to describe what actually happens
  • Extends tools/CompactionRepro with stress-test scaffolding for future regressions

Why

#942 dropped the cap to 1 GB on the theory that a tighter limit + temp_directory would force earlier spilling and keep peak working set down. That validation ran on query_stats (narrow, ~1.7M rows). The reporter's failure is on query_snapshots, which carries query_text + query_plan + live_query_plan per row. Nightly logs (#933) show OOM at 906/953 MiB used with the 1 GB cap.

Standalone reproducer pinpoints the cause: parquet COPY in DuckDB v1.5.2 makes allocations that bypass the buffer manager and can't be spilled. The cap acts as a hard ceiling for those, not a spill trigger. Spill on disk = 0 MB across every configuration tested (memory_limit 1/2/4 GB; accumulator vs tournament merge; threads 1 vs 2; :memory: vs file-backed DB). Same failure reproduces in standalone DuckDB CLI v1.5.2 — engine issue, not a binding artifact. Upstream: duckdb#16482, duckdb#10084.

DuckDB's OOM guide explicitly warns about this and recommends memory_limit at 50-60% of system RAM. 4 GB sits well inside that range and leaves real headroom on top of the un-spillable allocations.

Reporter's actual file sizes (provided in #933): 15-25 chunks of 2-6 MB plus a 35-45 MB monthly file per group, totaling ~97-143 MB per merge. Comfortably below where 4 GB has any trouble.

Test plan

  • dotnet build Lite/PerformanceMonitorLite.csproj -c Release — clean
  • Reproducer at tools/CompactionRepro succeeds at 4 GB on synthetic data shaped like query_snapshots:
    • 200 MB / 15 chunks: peak WS 129 MB
    • 1.5 GB / 15 chunks: peak WS 396 MB
  • Reproducer fails at 1 GB on the same data (matches reporter's nightly error string)
  • Reporter confirms next nightly resolves the OOM on their 4-server setup
  • Upstream comment posted to duckdb#16482 with the synthetic reproducer

Notes

  • tools/CompactionRepro gained --strategy {accumulator|tournament}, --db-mode {memory|file}, --merge-files, --synthetic (with --synthetic-rows and --synthetic-plan-kb), and --cycles for leak testing. Kept so a future regression in this area is easy to reproduce.

🤖 Generated with Claude Code

#942 lowered the cap to 1 GB on the theory that a tight memory_limit plus
temp_directory would force DuckDB to spill earlier and keep peak working
set down. That validation ran against query_stats (narrow, ~1.7M rows) and
showed peak 1236 MB → 166 MB. The reporter's actual failure is on
query_snapshots, which carries query_text + query_plan + live_query_plan
per row. With the 1 GB cap, the nightly logs show OOM at "906/953 MiB
used" before any merge progress.

The standalone reproducer (tools/CompactionRepro) confirms the cause:
parquet COPY in DuckDB v1.5.2 makes allocations that bypass the buffer
manager and can't be spilled. The cap acts as a hard ceiling for those,
not a spill trigger. Spill on disk = 0 MB across every configuration we
tested (memory_limit 1/2/4 GB, accumulator vs tournament merge, threads
1 vs 2, :memory: vs file-backed DB). The same failure reproduces in
standalone DuckDB CLI v1.5.2, so it's an engine issue — see upstream
issues duckdb#16482 and discussion#10084.

DuckDB's own OOM guide explicitly warns about this case and recommends
memory_limit at 50-60% of system RAM, not a tight cap. 4 GB sits well
inside that range for typical workstation/server hosts and leaves real
headroom on top of the un-spillable allocations.

Reporter's actual file sizes (15-25 chunks of 2-6 MB plus a 35-45 MB
monthly file per group) are well below the level where 4 GB has any
trouble. The reproducer confirms 4 GB succeeds on a synthetic
query_snapshots-shaped dataset of ~1.5 GB with peak working set of
~400 MB; the reporter's data is ~143 MB at worst.

Also updates the stale comment about spilling — temp_directory was set
per #935 but the buffer-manager-bypassing allocations don't use it. The
comment now describes what actually happens.

The tools/CompactionRepro changes add --strategy {accumulator|tournament},
--db-mode {memory|file}, --merge-files, --synthetic data generation, and
--cycles for leak testing. These are kept so a future regression in this
area can be reproduced and diagnosed quickly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@erikdarlingdata erikdarlingdata merged commit 06db2d7 into dev May 12, 2026
2 checks passed
@erikdarlingdata erikdarlingdata deleted the feature/933-bump-compaction-memory-limit branch May 12, 2026 04:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant